Open Source Toolkit for Statistical Machine Translation: Factored Translation Models and Confusion Network Decoding
نویسندگان
چکیده
The 2006 Language Engineering Workshop Open Source Toolkit for Statistical Machine Translation had the objective to advance the current state-of-the-art in statistical machine translation through richer input and richer annotation of the training data. The workshop focused on three topics: factored translation models, confusion network decoding, and the development of an open source toolkit that incorporates this advancements. This report describes the scientific goals, the novel methods, and experimental results of the workshop. It also documents details of the implementation of the open source toolkit.
منابع مشابه
Moses: Open Source Toolkit for Statistical Machine Translation
We describe an open-source toolkit for statistical machine translation whose novel contributions are (a) support for linguistically motivated factors, (b) confusion network decoding, and (c) efficient data formats for translation models and language models. In addition to the SMT decoder, the toolkit also includes a wide variety of tools for training, tuning and applying the system to many tran...
متن کاملNiuTrans: An Open Source Toolkit for Phrase-based and Syntax-based Machine Translation
We present a new open source toolkit for phrase-based and syntax-based machine translation. The toolkit supports several state-of-the-art models developed in statistical machine translation, including the phrase-based model, the hierachical phrase-based model, and various syntaxbased models. The key innovation provided by the toolkit is that the decoder can work with various grammars and offers...
متن کاملAkamon: An Open Source Toolkit for Tree/Forest-Based Statistical Machine Translation
We describe Akamon, an open source toolkit for tree and forest-based statistical machine translation (Liu et al., 2006; Mi et al., 2008; Mi and Huang, 2008). Akamon implements all of the algorithms required for tree/forestto-string decoding using tree-to-string translation rules: multiple-thread forest-based decoding, n-gram language model integration, beamand cube-pruning, k-best hypotheses ex...
متن کاملPhrasal: A Toolkit for New Directions in Statistical Machine Translation
We present a new version of Phrasal, an open-source toolkit for statistical phrasebased machine translation. This revision includes features that support emerging research trends such as (a) tuning with large feature sets, (b) tuning on large datasets like the bitext, and (c) web-based interactive machine translation. A direct comparison with Moses shows favorable results in terms of decoding s...
متن کاملNcode: an Open Source Bilingual N-gram SMT Toolkit
This paper describes N, an open source statistical machine translation (SMT) toolkit for translation models estimated as n-gram language models of bilingual units (tuples). This toolkit includes tools for extracting tuples, estimating models and performing translation. It can be easily coupled to several other open source toolkits to yield a complete SMT pipeline. In this article, we review...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007